Overview

The goal of this document is guide the development of habitat suitability analyses including (1) explore the distributions of and correlations between key variables, (2) test analysis methods to understand relationships

Feather River will be used as a case study location with the goal of creating a more streamlined workflow that could be applied elsewhere. There are two relevant datasets (1) mini snorkel data, and (2) the intermediate level ongoing snorkel survey. These data are utilizing different methodologies and part of these analyses will explore the differences between these two data collection methods and resulting habitat.

This markdown focuses on cluster analysis.

Mini Snorkel

Description of sampling

Key variables

Data processing considerations

Distributions of key variables

species

Chinook, steelhead, tule perch, and speckeled dace are observed

fork length

fork length for chinook

Most Chinook observations are for fry (~40mm)

fork length for steelhead

Most steelhead observations are for smaller fish but there is some variation.

depth

depth of microhabitat

depth of fish observation

velocity

velocity of microhabitat

velocity of fish observations

count

count of all species

count by species

Most observations are of Chinook salmon

count of chinook

count of steelhead (wild)

count of steelhead (clipped)

percent cover

summary of percent cover by cover type where cover > 0% and Chinook salmon observed

summary of percent cover by cover type where cover > 0% and Steelhead observed

percent cover by transect code

The following plots summarize percent cover by type and transect code to help describe the types of habitats surveyed. This information needs to be summarized better.

small woody cover

large woody cover

submerged vegetation

undercut bank

half meter overhead

more than half meter overhead

cover - presence/absence

We transformed cover to presence/absence. If any of the cover types are > 20% then cover is present.

instream cover presence (1) and absence (0) for Chinook salmon

Instream cover means any of the instream cover types greater than 20%

overhead cover presence (1) and absence (0) for Chinook salmon

Overhead cover means any of the overhead cover types greater than 20%

instream cover presence (1) and absence (0) for Steelhead

overhead cover presence (1) and absence (0) for Steelhead

Correlations

Checked the correlations between cover and substrate and did not find any highly correlated.

No correlations between distance to bottom and velocty.

Cover

Percent no cover in channel is highly inversely correlated with submerged aquatic vegetation.

Percent no cover overhead is highly inversely correlated with cover overhead.

Substrate

None of the percent cover substrate variables are correlated.

Cluster analysis

The goal of the cluster analysis is to identify groupings of fish observations and what best describes those groupings. Unlike typical habitat analysis this will just focus on characteristics of fish observations.

TODO

  • try with a density variable
  • try cluster analysis or PCA to come up with substrate/cover groups

Key variables

  • date (or month)
  • species (may decide to filter to just Chinook and Steelhead)
  • count (note that this variable does not seem to be working well)
  • dist_to_bottom
  • fl_mm (note that this variable does not seem to be working well)
  • focal_velocity
  • some measure of cover
  • some measure of substrate

Data inputs

I tried the cluster analysis multiple times with slight variations in the data used. These are the data ultimately included:

  • dist_to_bottom
  • focal_velocity
  • percent_small_gravel_substrate
  • percent_large_gravel_substrate
  • percent_cobble_substrate
  • percent_boulder_substrate,
  • percent_small_woody_cover_inchannel
  • percent_large_woody_cover_inchannel
  • percent_submerged_aquatic_veg_inchannel
  • percent_undercut_bank
  • percent_cover_half_meter_overhead
  • surface_turbidity
  • species_cat (categorized chinook as 1, steelhead as 2 and other as 3)
  • month (transformed date of observation to month)

Notes - if we add count or fl_mm to the above results we end up with one really big cluster and 2 smaller ones - if we remove the substrate variables and add count we end up with one really big cluster and 2 smaller ones - if we do not filter data to only fish observations and include count we end up with 4 clusters where 3 are large (~1500) and one is about 500. This may be worth digging into a little further. - was thinking that maybe we would see a fl_mm effect but i think there are too few large fish. decided to filter out large fish as this is not the target of the study and therefore may be leading to erroneous results.

Dendrogram of clusters

The dendrogram visualizes the connections between datapoints. Need to clean this plot up.

Scree plot

The elbow of the scree plot visualizes the ideal number of clusters.

##          Height JoinsThis WithThis
## [359,] 207.2381       316      349
## [360,] 210.9009       326      357
## [361,] 224.6831       351      355
## [362,] 259.7803       322      353
## [363,] 262.5066       345      350
## [364,] 288.6768       360      363
## [365,] 301.4911       344      359
## [366,] 354.1795       362      365
## [367,] 385.4339       358      361
## [368,] 510.2824       366      367
## [369,] 683.1727       364      368
## [370,] 695.0076       356      369

Results

Number of clusters

Based on analysis of multiple indices, 3-4 clusters is the best fit. Note that this could be looked into further to confirm this is the best number of clusters. I decided to go with 4 clusters based on the results of the scree plot.

The 4 clusters have very similar numbers (and sizes) of fish observations across a similar distribution of months. The species observed are similar though the low velocity and high aquatic vegetation cluster has more steelhead. The defining characteristics include the following:

  • The deep cluster with large gravel
  • The high cover (large and small wood and overhead) cluster
  • The high(er) velocity cluster with no cobble or boulder and high small gravel
  • Lowest velocity and least turbid with high submerged aquatic vegetation, some boulder, and high cobble

The following includes a number of plots that could be used to visualize these results

Depth and velocity

There are small differences in local depth and velocity. There is a trend that can be observed but these results may not be significant.

Substrate

There very clear differences in percent of small gravel, large gravel, and cobble between groups. There are smaller differences in percent boulder.

Cover

There are very clear differences in percent submerged aquatic vegetation and percent cover overhead. There are smaller differences in percent small woody cover. There are small differences in percent large woody cover and undercut banks.

Intermediate Snorkel

This is currently in progress because there is still some data processing needed

Data wrangling fixes for the intermediate snorkel data

  • Cover and substrate have multiple codes in some years. Currently, some are decoded and are others are coded. Need to keep all uncoded
  • Need to map cover and substrate to a hierarchy (e.g. Gard has a cover and substrate coding system)
  • We should double check the data Erin pulled from the database - was this filered to Chinook? We want all species
  • section_name, units_covered, unit, location, section_type, unit_type, section_number are all related to location. We need to do a better job getting these organized.

Distributions of key variables

species

Only Chinook are observed

TODO check that these were not filtered from dataset

## [1] NA        "chinook" "unknown"

fork length

fork length for chinook

Most Chinook observations are for fry (~40mm) though there is some variation

depth

depth

bank distance

count

cover

Instream cover codes

  • A: No apparent cover (1)
  • B: Small instream objects/small-medium woody debris (2)
  • C: Large instream objects/large woody debris (3)
  • D: Overhead objects (4)
  • E: Submerged vegetation (5)
  • F: Undercut bank (6)

Overhead cover codes

  • 0: No apparent overhead cover
  • 1: Overhanging vegetation with 0.5m above water surface
  • 2: Overhanging vegetation 0.5-2m above water surface
  • 3: Surface turbulence, bubble curtain
##  [1] NA      "A"     "BDEF"  "FE"    "D"     "BCDE"  "B"     "C"     "EC"   
## [10] "BE"    "BCDEF" "BDF"   "BD"    "E"     "EF"    "F"     "BEF"   "BCE"  
## [19] "BCEF"  "EB"    "CE"    "BCD"   "CDF"   "BDE"   "BF"    "CEF"   "BED"  
## [28] "CF"    "EFD"   "AE"    "BC"    "BEC"   "DE"    "CDEF"  "BEFD"  "CD"   
## [37] "ECF"   "BCF"   "BCDF"  "CDE"   "CED"   "DF"    "AG"    "ABD"   "AD"   
## [46] "FB"    "AB"    "bdf"
## # A tibble: 1 × 1
##       n
##   <int>
## 1  2061
## [1] NA  1  2  5  4  3  6
## # A tibble: 1 × 1
##       n
##   <int>
## 1  2062